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pointer array. Often, as in the spreadsheet example, an easy 
method exists to index the pointer array and link it with the 
sparse-array elements. This makes accessing the elements of the 

S ™L h nked-I,st version is very slow by comparison because 
it must use a linear search to locate each element. Even if extra 
information were added to the linked list to allow faster accessing, 
it would still be slower than the pointer array's direct access. t£ 

nl^7 tr t e w C u rtamly Speeds up the search time > but when com- 
pared with the pointer array's direct-indexing capability it still 
seems sluggish. If the hashing algorithm is property chosen the 
hashing method can often beat the binary tree'in acce'ss to 
it will never be faster than the pointer-array approach. 
n .™ choosin f an approach, the rule of thumb is to use the 
pointer-array implementation when possible-it is the fastest in 
terms of access time. If memory is in short supply, then you have 
no choice but to use the linked-list or binary-tree approach 
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Choosing an Approach 



When deciding whether to use a linked list, a binary tree, a 
pointer array, or hashing to implement a sparse array, consider 
two factors: speed and memory efficiency. 

When the array is very sparse, the most memory-efficient 
approaches are the linked list and the binary tree because only 
array elements that are actually in use have memory allocated to 
them. The links themselves require very little additional memory 
and usually have a negligible effect. The pointer array requires 
that the entire pointer array exits even if some of its elements are 
not used. This means that not only must the entire pointer array 
fit in memory, but that enough memory must be left over for the 
application to use. This could be a serious problem in certain 
applications, but it may not be a problem at all in others. Usually 
you can decide if the pointer array causes a problem for you by 
calculating the approximate amount of free memory and deter- 
mining whether that is sufficient for your program. The hashing 
method lies somewhere in the middle, between the pointer-array 
approach and the linked-list or binary-tree approach. Although 
the hashing method does require the existence of the entire physi- 
cal array (even if all of it is not used), it may be that the physical 
array is still smaller than a pointer array (which needs at least 
one pointer for each logical-array location). 

However, when the array is fairly full, the situation changes. 
In this case the pointer array makes better use of memory. The 
reason is that the tree and linked-list implementations need two 
pointers, whereas the pointer array has only one pointer. For 
example, if a 1000-element array was full and pointers were 2 
bytes long, then both the binary tree and linked list would use 
4000 bytes for pointers. The pointer array, on the other hand, 
would need only 2000— a savings of 2000 bytes. In the hashing 
method even more memory is "wasted" to support the array. 
By far the fastest approach, in terms of execution, is the 
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char *ce I L_name; 
i nt h, loc ; 

/* produce the hash value */ 
loc=*cel l_name - ' A 1 ; 

loc+=(atoi (See U_name C1 ] > -1 ) * 26; /* WIDTH columns* num rows */ 
h=loc/10; 



/* return the value if found */ 
if(hashCh].index==loc) return(hashrhD.val); 
else < /* try next location */ 

whUe(h<MAX> < /* find a free loc */ 
h = hashCh3 .next ; 

if(h==-1) break; /* not found */ 
if(hashChD.index==loc> return(hashChD.val); 

> 

printfC'not in array \n">; 
return -1; 

> 

> 



Analysis of Hashing 

In its best case, which occurs rarely, each physical index created 
by the hash is unique, and access times approximate that of direct 
indexing. This means that no hash chains are created and all look- 
ups are essentially direct accesses. However, this is seldom the 
case because it requires that the logical indexes be evenly distrib- 
uted throughout the logical-index space. In the worst case, which 
is also rare, a hashed scheme degenerates into a linked list. This 
can happen when the hashed values of the logical indexes are all 
the same. In the average and most likely case, the hash method 
can access any specific element in the same time that it would 
take to use a direct index divided by some constant that is propor- 
tional to the average length of the hash chains. In using hashing to 
support a sparse array, it is critical that the hashing; algorithm 
spread the physical index evenly so that long hash chains are 
avoided. Also, hashing is best applied to situations in which you 
know that there is a limit to the number of array locations actu- 
ally required. 
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pointed to by the hashed value is occupied, then it searches for the 
first free location. When a free location is found, the value of the 
logical index as well as the value of the array element are stored. 
It is necessary to store the logical index because it will be needed 
when that element is searched for. 



/* compute hash and store value */ 

void store(cel l_n a m e , v ) 

char *cell_nanie; 

int v; 

{ 

int h, prior,loc; 

/* produce the hash value */ 

ur/=< e aiiT( a 8 ce;'i*na B eCn>-1)*26;/* WIDTH colunns * nu„ rows ./ 
h=loc/10; 

/* store in the location unless full or 

store there if logical indexes agree - i.e., update. 

if ChashCh3.index==-1 II ha sh Ch3 . i ndex== loc ) i 
hashChD .index=loc; 
hashthD .val=v; 

> 

else i /* try next location */ 

while<h<MAX) < /* find a free loc */ 
pri or = h; 
h + + ; 

if (hashCh].index==-1> break; 

> 

if (h ==MAX> < 

printf ("hash error or array full\n ); 
return; 

> 

hashth3.val=v; 
hashCh3.index=loc; 

hashCprior3.next=h; /* add the link */ 

> 



To find the value of an element already in the array, you must 
first compute its physical address. Then check to see if the logical 
index stored in the physical array matches that of the index of the 
logical array that is requested. If it does, then that value is 
returned; otherwise, the chain is followed. The function find(), 
which does this, is shown here. 

/* compute hash and return value */ 
int f ind(cell_name) 
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Figure 20-3. A hashing example: how the logical index maps onto the 
physical array 



The procedure store() converts a cell name into a hashed 
index into the array hash. Notice that if the location directly 
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location— whether the pointers are pointing to actual information 
or not. This may be a serious limitation for certain applications, 
but in general it is not a problem. 

Hashing 



Hashing is the process of extracting the index of an array element 
directly from the information that will be stored there. The index 
generated is called the hash. Traditionally, hashing has been ap- 
plied to disk files as a means of decreasing access time. However, 
the same general methods can be used as a means of implement- 
ing sparse arrays. The procedure used with the preceding exam- 
ple of a pointer array used a special form of hashing called direct 
indexing, in which each key maps onto one and only one array 
location. That is, each hashed index is unique. (Note, however, that 
the pointer-array approach does not require a direct-indexing 
hash— it was just an obvious approach to the spreadsheet prob- 
lem.) However, in actual practice, such direct-hashing schemes 
are few; a more flexible method is required. In this section you 
will see how hashing can be generalized to allow greater power 
and flexibility. 

If you think about the spreadsheet example, it is clear that 
even in the most rigorous environments not every cell in the sheet 
will be used. For the sake of this example, assume no more than 
10 percent of the potential locations in almost all cases are occu- 
pied by actual entries. This means that if the spreadsheet has the 
dimensions 26 by 100 (2,600 locations), then only 260 will ever be 
used at any one time. Therefore, the size of the largest array 
necessary to hold all the entries is 260 elements. The problem 
then becomes this: How do the logical-array locations get mapped 
onto and accessed from this smaller physical array? The answer is 
the use of a hash chain. 

When the user of the spreadsheet (the logical array) enters a 
formula for a cell, the cell location (which is defined by its name) 
is used to produce an index (a hash) into the smaller physical 
array. Assume that the physical array is called sheet. The index 
is derived from the cell name by converting the name into a 
number, as shown in the example of the pointer array. However, 
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this number is then divided by 10 to produce an initial entry point 
into the array. (Remember that in this example the physical array 
is only 10 percent as big as the logical array.) If the location that 
is referenced by this index is free, then the logical index and the 
value are stored there. Otherwise, the array sheet is searched for 
an open element. When an unused element is found, the informa- 
tion is placed there and a pointer to this location is stored in the 
original element. This situation is depicted in Figure 20-3. 

To find an element in the physical array given the logical- 
array index, you first transform the logic index into its hash 
value. Then check the physical array at the index generated by 
the hash to see if the logical index stored there matches the one 
that you are searching for. If it does, then return the information. 
Otherwise, follow the hash chain until either the proper index is 
found or the end of the chain is reached. 

To see how this procedure is actually applied to the spread- 
sheet program, you must define the following array of structures, 
which acts as the physical array: 



tfdefine MAX 260 



struct htype < 

int index; /* actual index */ 

int val; /* actual value of the array element */ 
int next; /* index of next value with same hash */ 
> hashCMAXD; 



Before this array can be used, it must be initialized. To indicate 
an empty element, the following function initializes the index 
field to -1, a value that by definition cannot be generated. The -1 
in the next field is used to indicate the end of a hash chain. 

/* init the hash array */ 

voi d i ni t () 

< 

register int i; 

for (i=0; i <MAX; i++) { 
hash Ci D . i ndex = -1 ; 

hashCi J.next=-1; /* null chain */ 
hash Ci D . va 1=0; 

> 

> 
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